Learning Residual Finite-State Automata 
Using Observation Tables 



Anna Kasprzik 

FB IV, University of Trier 
kasprzikOinf ormatik.uni-trier . de 

We define a two-step learner for RFSAs based on an observation table by using an algorithm for 
minimal DFAs to build a table for the reversal of the language in question and showing that we can 
derive the minimal RFSA from it after some simple modifications. We compare the algorithm to two 
other table-based ones of which one (by Bollig et al. |8|) infers a RFSA directly, and the other is 
another two-step learner proposed by the author. We focus on the criterion of query complexity. 
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1 Introduction 

The area of grammatical inference tackles the problem of inferring a description of a formal language 
(a grammar, an automaton) from given examples or other kinds of information sources. Various settings 
have been formulated and quite a lot of learning algorithms have been developed for them. One of the 
best studied classes with respect to algorithmical learnability is the class of regular languages. 

A significant part of these algorithms, of which Angluin's L* 0] was one of the first, use the concept 
of an observation table. If a table fulfils certain conditions we can directly derive a deterministic finite- 
state automaton (DFA) from it, and if the information suffices this is the minimal DFA for the language 
in question. 

In the worst case the minimal DFA has exponentially more states than a minimal NFA for a language 
L, and as for many applications a small number of states is desirable it seems worth to consider if we 
cannot obtain an NFA instead. Denis et al. JH introduce special NFAs - residual finite-state automata 
(RFSAs) - where each state represents a residual language of L. Every regular language has a unique 
minimal RFSA. Denis et al. give several learning algorithms for RFSAs [HHUGl, which, however, all 
work by adding or deleting states in an automaton. 

We define a two-step learner for RFSAs based on an observation table by using an algorithm for 
minimal DFAs to build a table with certain properties for the reversal of the language L and showing that 
we can derive the minimal RFSA for L from this table after some simple modifications. We compare the 
algorithm to two other table-based ones of which one is an incremental Angluin-style algorithm by Bollig 
et al. [8] which infers a RFSA directly, and the other is another two-step algorithm proposed below. The 
comparison mainly focuses on query complexity. We find that in theory the algorithm in [ 8 ] does not 
outperform the combination of known algorithms inferring the minimal DFA with the modifications we 
propose (although it is shown in [81] that their algorithm behaves better in practice). 

2 Basic notions and definitions 

Definition 1 An observation table is a triple T = (S,E,obs) with S,E C I,* finite, non-empty for some 
alphabet £ and obs : S x E — > {0, 1} a function with obs(s,e) = 1 if se £ L, and obs(s,e) = if se £ L. 
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The row of s G S is row(s) := {(e,obs(s,e))\e G E}, and the column of e G E is col(e) := {(s,obs(s,e))\ 
s G S}. S is partitioned into two sets RED and BLUE where uv G RED =>«G RED /or k,v £ I* (prefix- 
closedness), and BLUE := {sa G S\red|s G RED, a G £}. 

Definition 2 Le? T = (S,E,obs) with S = RED U BLUE. Two elements r,s G S are obviously different 

(denoted by r <> s) iff '3e G E such that obs(r,e) ^ obs(s,e). T is closed iff ->3s G BLUE : Vr G RED : 
r <> s. T is consistent iff\/s\,S2 G RED, sia,S2a G S, a G £ : row(si) = row(s2) => row(s\a) = row(j2«)- 
Definition 3 A finite-state automaton is a tuple srf = (L,Q,Qo,F,8) with finite input alphabet £, finite 
non-empty state set Q, set of start states <2o Q Q, set of final accepting states F C Q, and a transition 
function 8 : <2 x I — > 2 e . 

If Qo = {<7o} 5 raa/« most one state to any pair in gxl f/je automaton is deterministic (a 
DFA), otherwise non-deterministic (an NFA). 7f 5 maps at least one state to every pair in gxl the 
automaton is total, otherwise partial. 

The transition function can always be extended to 8 : QxL* — > 2@ defined by 8(q,s) = {q} and 
8(q,wa) = 8(8(q,w),a) for q G Q, a G £, andw G £*. 

Let 8(Q',w) := U{<5(a,w)|a G Q'} for Q' C g and w G £*. A state q <E Q is reachable if there is 
w G £* w//z a G 5(2o>>v). A state q £ Q is useful if there are w\,W2 G S* w/ta q G 5(<2o,wi) 
5(o,W2) (IF 7^ 0, otherwise useless. 

The language accepted by g/ is Jz?(.e/) := {w G L*\S(Qo,w) (IF ^ 0}. 

From T = (S,E,obs) with 5 = RED U BLUE and e G£ derive an automaton s^t '■= (^,Qt,Qto,Ft,8t) 
defined by Qt = row (red), Qjq = {row(s)}, Ft = {row(s)\obs(s,e) = 1, s G red}, and 8T(row(s),a) = 
{<? £ Qr\~>(q <> row(sa)), s G RED, a € E, i« 6 S}. srfj is a DFA iff T is consistent. The DFA for a 
regular language L derived from a closed and consistent table has the minimal number of states (see ID, 
Th. 1). This DFA is the canonical DFA .s^^for L and is unique. 

The Myhill-Nerode equivalence relation =l is defined by r =l s iff re G L 44> se G L for all r, s, e G £*. 
The index of L is It := |{[so]z,ko G £*}[ where [sq]l is the equivalence class under =i containing sq. 
Theorem 1 (Myhill-Nerode theorem - see for example ||3"1 ) 

is finite L can recognized by a finite-state automaton 44> L is regular. 
£/l has exactly Ii states, each of which represents an equivalence class under =/,. 

Definition 4 77ie reversal w of w G T* ?J defined inductively bye := e and aw := wafor a G T, w G 2L*. 
77je reversal ofX C £* z's defined asX := {w\w G X}. 77je reversal of an automaton = (£, 2, Qq,F, 8) 
is defined as g/ := (Z, <2oi<>) w/f/j 5(a',w) = {a G <2|g' G 8(q,w)} for q' G 2, w G £*. 

Definition 5 77ie residual language (RL) o/LCE* w/ta regard to w £ I* is defined as w~ l L := {v G 
£*|wv G L}. A 7?L w _1 L « called prime /^U{v _1 L|v _1 L C w _1 L} C w _1 L, otherwise composed. 

By Theorem[T]the set of distinct RLs of a language L is finite iff L is regular. There is a bijection between 
the RLs of L and the states of the minimal DFA s^i = (£, Ql, {qi},FL, 8l) defined by {w _1 L H> q'\w G E*, 
8 L (qL,w) = {<?'}}. 

Let L 9 := {w\8(q, w) OF 7^ 0} for a regular language LCI*, some automaton srf = (£, Q,Qo,F, 8) 
recognizing L, and q £ Q. 

Definition 6 A residual finite-state automaton (RFSA) is an NFA = (Z, Q, Qo,F, 8) such that L q is a 
RL of J£(srf) for all states q G Q. 

Definition 7 The canonical RFSA £% L = (I, Q R , Q R0 ,F R , 8 R )for LCI.* is defined by Q R = {w- l L\w~ x L 
is prime}, Q R0 = {w~ l L G Q R \w~ l L C L}, F R = {w~ l L\e G w~ l L}, and 8 R {w~ 1 L,a) = {v _1 L G Q R \ 
v- x LC\(wa)- l L}. 

Mi is minimal with respect to the number of states (see |4], Theorem 1). 
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Figure 1 : An example for a coverable column (labeled by e) 

3 Inferring a RFSA using an observation table 
3.1 A "parasitic" two-step algorithm 

The learner we define infers the canonical RFSA for L from a suitable combination of information 
sources. A source can be an oracle for membership queries (MQs; 'Is this string contained in the 
language?') or equivalence queries (EQs; 'Is A a correct automaton for LT - yielding some c € 
(L\ Jzf (A)) U (j£f (A) \L) in case of a negative answer) or a positive or negative sample of L fulfilling cer- 
tain properties, and other kinds of sources can be considered as well. Suitable known combinations are: 
An oracle for MQs and EQs (a minimally adequate teacher, or MAT), an oracle for MQs with positive 
data, or positive and negative data. 

In a first step we use an existing algorithm to build a table T = (red' U blue', E', obs') representing 
the canonical DFA for the reversal L of L. For eligible algorithms for various settings see [T] (L*, MAT 
learning), [9] (learning from MQs and positive data), or lfT2l (this meta-algorithm covers MAT learning, 
MQs and positive data, and positive and negative data, and can be adapted to other combinations). All 
these learners add elements to the set labeling the rows of a table (candidates for states in s*/t) until it 
is closed, and/or separating contexts (i.e., suffixes revealing that two states should be distinct) to the set 
labeling the columns until it is consistent - additions of one kind potentially resulting in the necessity 
of the other and vice versa - and, once the table is closed and consistent, deriving a DFA from it that 
is either £?l or can be rejected by a counterexample from the information sources, which is evaluated 
to restart the cycle. Obviously, since the sources only provide information about L and not L, we must 
minimally interfere by adapting data and queries accordingly: Strings and automata have to be reversed 
before submitting them to an oracle, samples and counterexamples before using them to construct T' . 

In the second step we submit T' to the following modifications: 

(1) Only keep one representative for every distinct row occurring in the table in red', and only keep 
one representative for every distinct column in E' . 

(2) Eliminate all representatives of rows and columns containing only 0s. 
Let the resulting table be T" = (red" U BLUE" ,E" obs"). 

(3) Eliminate all representatives of coverable columns, i.e., all e € E" with 

3ei,...,e n £ E" : Vs € red" : 
[obs"(s,e)=0^\/ie{l,...,n};obs"(s,ei)=0] A 
\obs"(s,e) = l^3ie{l,...,n}:obs"(s,ei) = lj. 

For example, the column labeled by e in Figure Q] would be eliminated because its Is are all 
"covered" by the columns labeled by e\, e%, and £3. 

Note that the first two modifications mainly serve to trim down the table to make the third modification 
less costly. In fact, most algorithms mentioned above can easily be remodeled such that they build tables 
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in which there are no rows or columns consisting of 0s and in which the elements labeling the rows in 
the RED part are pairwise obviously different already such that no row is represented twice. 

The table thus modified shall be denoted by T = (red U blue, E ,obs) and the derived automaton 
by s^j = (£, Qt,Qto,Ft,8t) with Fj = Fji (this has to be stated in case e has been eliminated). As we 
have kept a representative for every distinct row and as all pairs of red' elements that are distinguished 
by the contexts eliminated by (3) must be distinguished by at least one of the contexts covering those as 
well stfj still represents (but without a failure state). 

We use T to define M := (L,Qr, Qro,F r ,8r) with Q R = {q C RED|3e £E :seqO obs(s,e) = 1}, 
Qro = {</ G <2r|Vs G q : obs'(s,e) = 1} (obs' in case e has been eliminated), Fr = {q G Qr\£ G q}, and 
8g{qi,o) = {q2\l2 C 8r(qi,a)} for ^,^6 Qr and a£l, and 5r is the transition function of the reversal 

Of 

Observe that every state in Qr corresponds to a column in T. As every element of RED represents an 
equivalence class of L under the Myhill-Nerode relation every state in Qr also corresponds to a unique 
set of equivalence classes, and the associated column represents the characteristic function of that set. 
We show that M is the canonical RFSA for L. The proof uses Theorem [2] 

Definition8 Let A = (L,Q,Q ,F,8) be an NFA, and define Q* := {p C Q\3w G I* : 8(Q ,w) = p}. 
A state q G Q° is said to be coverable iff there exist qi, . . . ,q n G <2 \ {q}for n>\ such that q = U/=i a i- 

Theorem 2 (Cited from [4]). Let L be regular and let B = (E, Qr, Qbo^Fr, 8r) be an NFA such that B is 
a RFSA recognizing L whose states are all reachable. Then C(B) = (£, Q c , <2co>^b> 8c) with Qc = {p G 
Q%\p is not coverable}, Q C q = {p G Q c \p Q Qbo}, F c = {p G Qc\p H Fr ^ 0}, and 8 c (p,a) = {p' G 
Qc\p' Q 8g{p,a)} for p G Qc and a GlZis the canonical RFSA recognizing L. 

As a further important result it has also been shown in [4], Section 5, that in a RFSA for some regular 
language L whose states are all reachable the non-coverable states correspond exactly to the prime RLs 
of L and that consequently Qc can be identified with the set of states of the canonical RFSA for L. 

Lemma 3 (See [4], Prop. 1). Let A = (I, Q, Q Q ,F, 8) be a RFSA. For every prime RL w~ l S£(A) there 
exists a state q G 8(Qo,w) such that L q = w~ x ££{A). 

Theorem 4 8fc is the canonical RFSA for L. 

Proof, srfr meets the conditions for B in Theorem [2] as (a) all states of srfj are reachable because s/j 
contains no useless states, (b) srfj is a RFSA: Every DFA without useless states is a RFSA (see |4l), 
and (c) J£(s/j) = L. As s/j contains no useless states stfj and .stfj have the same number of states 
and transitions, so we can set B = stfj = (I<,Qt,Ft,Qto,8t). Assuming for now that there is indeed a 
bijection between Qr and Qc it is rather trivial to see that 

• there is a bijection between Qro = {q G <2k|Vx G q : obs'(x,e) = 1} and <2co = {p G Qc\p Q Ft} 
due to F T = {x G RED\obs' '(x,e) = 1}, 

• there is a bijection between Fr = {q G Qr\s G q} and Fc = {p G Qc\p H Qto ^ 0} due to the fact 
that Qto = {e}, and that 

• for every q G Qr, p G Qc, and d£l such that q is the image of p under the bijection between Qr 
and Qc, 8 R {q,a) = {q 2 G Q R \q 2 Q 8 T {q,a)} is the image of S c (p,a) = {p' G Q c \p' C 8 T (p,a)}. 

It remains to show that there is a bijection between Qr and the set of prime RLs of L, i.e., Qc- From 
the definition of Qr it is clear that ffl is a RFSA: As noted above, every state in Qr corresponds to a 
column in T , labeled by a context e G E, and also to the set of equivalence classes [s\i such that se G L 
for s G RED. As a consequence the reversal of the union of this set of equivalence classes equals the RL 
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e~ l L, and hence every state in Q R corresponds to exactly one RL of L. According to Lemma [3j there is 
a state in Qr for each prime RL of L, so every prime RL of L is represented by exactly one column in T . 
By (3) we have eliminated the columns that are covered by other columns in the table. If a column is 
not coverable in the table the corresponding state in Qr is not coverable either: Consider a column in the 
table which can be covered by a set of columns of which at least some do not occur in the table. Due to 
Lemma |3l these columns can only correspond to composed RLs of L. If we were to add representatives 
of these columns to the table they would have to be eliminated again directly because of the restrictions 
imposed by (3). This means that if a column is coverable at all it can always be covered completely by 
restricting oneself to columns that correspond to prime RLs of L as well, and these are all represented in 
the table. Therefore Qr cannot contain any coverable states. 

Thus the correspondence between Qr and the set of prime RLs of L is one-to-one, and we have shown 
that ffl is isomorphic to the canonical RFSA for L. ■ 

Corollary 5 Let L be a regular language. The number of prime RLs of L is the minimal number of 
contexts needed to distinguish between the states of s^i_. 

Also note that we can skip the modification (3) in the second part of our algorithm if we restrict the 
target to bideterministic regular languages (see flU). 

3.2 Comparison to other algorithms: Query complexity 

An advantage of the algorithm described above is the trivial fact that it benefits from any past, present, 
and future research on algorithms that infer minimal DFAs via observation tables, and at least until now 
there is a huge gap between the amount of research that has been done on algorithms inferring DFAs and 
the amount of research on algorithms inferring NFAs - or RFSAs, for that matter. 

A point of interest in connection with the concepts presented is the study of further kinds of informa- 
tion sources that could be used as input and in particular suitable combinations thereof (see for example 
|[T2l for a tentative discussion). 

Another point of interest is complexity. As the second part of the algorithm consists of cheap com- 
parisons of 0s and Is only of which (3) is the most complex the determining factor is the complexity 
of the chosen underlying algorithm. One of the standard criteria for evaluating an algorithm is its time 
complexity, but depending on the different learning settings there are other measures that can be taken 
into consideration as well, one of which we will briefly address. 

For algorithms that learn via queries a good criterion is the number of queries needed, obviously. The 
prototypical query learning algorithm, Angluin's ID algorithm L*, which can be seen in a slightly adapted 
version L* oI in Figure[2l needs 0(l£) equivalence queries and 0(\L\ ■ \cq\ ■/£ ) membership queries, where 
II is the index of L C £* and |cq| the length of the longest given counterexample. By modifications 
the number of MQs can be improved to 0(\L\l1 + Iilog\co\) which according to iffOl is optimal up to 
constant factors. On the other hand, it has been shown in ifTTI that it is possible to decrease the number 
of EQs to sublinearity at the price of increasing the number of MQs exponentially. 

Recently, Bollig et al. [H have presented a MAT algorithm for RFSAs using an observation table 
that keeps very close to the deterministic variant L* ol mentioned above. They introduce the notions of 
RFSA-closedness and -consistency. 

Definition 9 Let T = (S,E,obs) be an observation table. A row labeled by s E S is coverable iff 
z\s\ , . . . , s n G S (is coverable by the rows ofs\,...,s n iff) 

MeeE: [obs(s,e) = =>• V/ G {1, . . . ,n} : obs(si,e) = 0] A 
[obs(s,e) = 1 =^ 3i G {1, . . . ,n} : obs(sj,e) = 1]. 
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initialize T := (S,E,obs) with 5 = red U blue and blue = red -E 

by red := {e} and E := {e} 
repeat until EG) = yes 

while T is not closed and not consistent 
if T is not closed 

find s G blue such that row (s) row (red) 
red := redU {s} (and update the table via MQs) 
if T is not consistent 

find *i,J2GRED, a G E, e € £ such that sia,S20 G S 

and -i(ji<>J2) and obs(s\ae) / obsfoae) 
E:=EL){ae} (and update the table via MQs) 
perform equivalence test 

if EQ = get counterexample c G (L\J^(s^t)) U (j£f(«0?r) \L) 
E := EUSuff(c) (and update the table via MQs) 

return 

Figure 2: L* oI 

Let ncov(S) C row(S) #e f/ie sef of non-cover able rows labeled by elements in S. 

Definition 10 Let T = (S,E,obs) be an observation table. We say that a row r G row(S) includes another 
row r 1 G row(S), denoted by r' C r, iff obs(s' ,e) = \ => obs(s,e) = 1 /or all e G £ a«c? G 5 
row (5) = r and row(s') = r'. 

Definition 11 A table T = (red U blue, E,obs) is RFSA-closed iff every row r G row(BLUE) is cover- 
able by some rows ri,...,r n G ncov(RED). 

Definition 12 A table T = (red U blue, E,obs) is RFSA-consistent iff row (s\) C row(s2) implies 
row(s\a) C row(s2a) for all S\,S2 G S and all a G E. 

From a RFSA-closed and -consistent table T = (red U BLUE, E,obs) Bollig et al. derive an NFA = 
(E, Q R , Qrq,F r , 8 r ) defined by Q R = wcov(red), Q rq = {r G 2 R |r C row(£)}, F fi = {r£ 2 R |V^ G RED : 
row(5) = r obs(s,e) = 1}, and 5s(row(«),a) = {r G 2«|r C row(^a)} with row(s) G <2# and a G E. 

Theorem 6 (See ['§!). Lef T be a RFSA-closed and -consistent table and the NFA derived from T. 
Then Mi is a canonical RFSAfor the target language. 

See (H for the proof. The algorithm NL* by Bollig et al. is given in Figure [3] 

The theoretical query complexity of NL* amounts to at most 0(/£) EQs and 0(|E| • |cq| - if) MQs. 
This exceeds the maximal number of queries needed by L* ol in both cases which is due to the fact 
that with NL* adding a context does not always lead to a direct increase of the number of states in 
the automaton derived from the table. Note that the authors of (H show that their algorithm statistically 
outperforms L* ol in practice, which is partly due to the fact that the canonical RFSA is often much smaller 
than the canonical DFA (see El). Nevertheless it is noteworthy that apparently inferring an automaton 
with potentially exponentially less states than the minimal DFA seems to be at least as complex. 

Inspired by [ 8 ] we propose another parasitic two-step algorithm that uses an existing algorithm with 
access to a membership oracle to establish a table T' = (red' U blue', E',obs') representing s^l and 
modifies it as follows: 
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initialize T := (S,E,obs) with 5 = red U blue and blue = red •£ 

by red := {e} and E := {e} 
repeat until EG) = yes 

while T is not RFSA-closed and not RFSA-consistent 
if T is not RFSA-closed 

find s G blue such that row{s) G ncov{S) \«cov(red) 
red := red U js} (and update the table via MQs) 
if T is not RFSA-consistent 

find sES, aGT, e EE such that obs(sae) = and 
obs[rfae) = 1 for some s' € S with row(i') C row(i) 
E:=EL){ae} (and update the table via MQs) 
perform equivalence test 

if EQ = get counterexample c E (L\j£f (.0^)) U {££{s^r) \L) 
E := EL>Suff(c) (and update the table via MQs) 
return g/ T 

Figure 3: NL*, the NFA (RFSA) version of L* col 

(2) ' Eliminate all representatives of rows and columns containing only Os. Let T" = (red" U blue", 

E",obs") be the resulting table. 

(3) ' For every s G red" and every final state qp of s$ T n add an (arbitrary) string e to E" such that 

&T"{row{s),e) = {qp }. Fill up the table via MQs. 

Let T = (red U blue, E, obs) be the resulting table. Note that as T already contains the maximal number 
of possible distinct rows T is still closed and therefore RFSA-closed. T is RFSA-consistent as well: Re- 
call that every element s G S represents a RL s~ l L of L (see Section©. If T was not RFSA-consistent we 
could find elements S\,S2 GS, e EE, and a G £ with row{s\) Q row(s 2 ) but obs(s l a,e) = 1 A obs(s2d,e) = 
0. However, ae G L and row{s\) C row{s2) imply that ae G L, and hence obs(s2d,e) = cannot 
be true. 

From T we derive an automaton M = (T>,Qr, Qro,Fr,8r) as in ||8l (see above). The NFA M is the 
canonical RFSA for L. This follows directly from Theorem[6]and the fact that T contains a representative 
for every RL of L. 

The algorithm outlined above needs Ii ■ \Fl\ MQs in addition to the queries needed by the algorithm 
establishing the original table but it does not require any more EQs. As EQs are usually deemed very 
expensive this can be counted in favor. Also note that if we restrict the target to bideterministic languages 
the table does not have to be modified and no additional queries have to be asked. 

4 Conclusion 

Two-step algorithms have the advantage of modularity: Their components can be exchanged and im- 
proved individually and therefore more easily adapted to different settings and inputs whereas non- 
modular algorithms are generally stuck with their parameters. One may doubt the efficiency of our 
two-step algorithms by observing that the second step partly destroys the work of the first, but as long as 
algorithms inferring the minimal DFA are much less complex than the ones inferring the minimal RFSA 
the two-step version outperforms the direct one. 
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It seems easy to adapt NL* to other learning settings such as learning from positive data and a mem- 
bership oracle or from positive and negative data in order to establish a more universal pattern for al- 
gorithms that infer a RFSA via an observation table similar to the generalization for DFAs attempted in 

ma. 
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